Peptides for the specific binding of rna targets

ABSTRACT

A recombinant polypeptide is described including at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base. The PUF RNA-binding domain of the polypeptide includes at least one RNA base-binding motif of the general formula X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11  wherein X 1  is selected from a defined group and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

TECHNICAL FIELD

This invention broadly relates to recombinant polypeptides comprising a RNA-binding domain capable of specifically binding to a desired RNA target molecule. The invention also relates to fusion proteins comprising the recombinant polypeptides and an effector domain; to isolated nucleic acids encoding same as well as recombinant vectors and host cells comprising nucleic acids encoding same; to compositions comprising the recombinant polypeptides or fusion proteins and use thereof; as well as methods of regulating expression of a gene and systems and kits for use therefor.

BACKGROUND ART

The following discussion of the background art is intended to facilitate an understanding of the present invention only. The discussion is not an acknowledgement or admission that any of the material referred to is or was part of the common general knowledge as at the priority date of the application.

The regulation of gene expression and cellular function in cells is controlled at many levels, including the regulation of the extent of chromatin structure, epigenetic control, transcriptional initiation and control of the rate thereof, messenger RNA (mRNA) transcript processing and modification, mRNA transport, mRNA transcript stability, translational initiation, control of transcript levels by small non-coding RNAs, post-translational modification, protein transport, and control of protein stability.

Antisense RNA (aRNA) and RNA interference (RNAi) technologies are well established tools for regulating gene expression through steric hindrance of translation and mRNA transcript degradation respectively. RNAi involves the introduction of short interfering RNA (siRNA) or microRNA (miRNA) into a cell, followed by the activation of RNAi cellular machinery and cleavage of a target messenger RNA (mRNA) transcript by RNA-induced silencing complex (RISC). However, the design of functional siRNA that is appropriately recognised by RNAi cellular machinery is highly complex and subject to various constraints (Tuschl, T. et al. (1999) Genes & Dev 13: 3191-3197). The siRNA must be 19-21 nucleotides in length and have a 2-nt 3′ overhang. The siRNA must exhibit limited G/C content and avoid consecutive stretches of the same base. Furthermore, the selectivity of strand loading into the RISC complex depends on the differential thermodynamic stabilities of the two ends of an siRNA duplex (Schwarz, S. D. et al. (2003) Cell 115:199-208), the less thermodynamically stable end being favoured for binding.

The design of pre-miRNA or miRNA to be processed into siRNA is further complicated by the requirement of secondary structural elements such as imperfectly base-paired stem regions flanked by free 5′ and 3′ ends and an unpaired loop region (Lund, E. and Dahlberg, J. E. (2006) Regulatory RNAs, Volume 71 of Cold Spring Harbor symposia on quantitative biology, CSHL Press). Thus, the constraints within which siRNA and miRNA molecules must be designed not only make the production of these molecules both challenging and complex but also limit the number and sequence of potential mRNA targets.

Despite the purported specificity of RNAi and aRNA technology for specific mRNA targets, cross-hybridization and non-specific binding can occur. In addition to the possibility for cross-hybridization of the antisense strand of siRNA to different mRNAs, siRNAs have demonstrated undesirable binding to various proteins (Bruckner, I. & Tremblay, G. A. (2000) Biochemistry 39: 11463-11466) causing significant nonspecific effects (Stein, C. A. (1995) Nat Med 1: 1119-1121). Moreover, the binding to affinity of siRNA-mediated binding of activated RISC to target mRNA (RNA-RNA interaction) is limited to that provided by canonical Watson and Crick base pairing, with guanine-cytosine interactions restricted to the expected three intermolecular hydrogen bonds, and adenine-uracil or uracil-guanine to the expected two intermolecular hydrogen bonds. Further disadvantages of RNAi technology include the need for transfection reagents or delivery vehicles, low and variable transfection efficiency sometimes necessitating the use of multiple transfection steps, partial and transient gene suppression effects, dependence upon processing by RNAi machinery, the limitation of mechanism to mRNA transcript degradation, and undesirable siRNA hairpin formation. siRNA is also known to be a potent activator of the mammalian innate immune system or IFN response (Judge, A, et al. (2008) Human Gene Therapy. (2008) 19: 111-124). The use of siRNA has been reported to cause an undesirable stimulation of immune activity and inflammatory response which may be further potentiated by the use of delivery vehicles, resulting in significant side effects due to excessive cytokine release and associated inflammatory syndromes. The potential for siRNA-based drugs to be rendered immunogenic is thus a cause for concern and has implications for both the development of siRNA-based drugs and the interpretation of gene-silencing effects elicited by siRNA (Judge, supra).

Alternatives to RNAi include the use of naturally occurring RNA-binding proteins which have been found to play essential roles in the regulation of gene expression. However, the modes by which such proteins bind RNA are idiosyncratic, restricted to sequence specific interactions, and difficult to predict so that their general use in biotechnological and medical applications is restricted. Information on the physiological targets of many RNA-proteins is limited and the binding of most RNA-proteins to their targets is reported to be idiosyncratic or require a combination of sequence and structural features such that their binding cannot be generally applied to other targets.

The PUF family of proteins (Drosophila Pumilio (Pum) and C. elegans FBF (fem-3 binding factor)) are an evolutionarily conserved family of RNA-binding proteins including Drosophila Pumilio and Caenorhabditis elegans FBF (for a review see Spassov, D. S. & Jurecic, R. (2003) IUBMB Life, 55: 359-366). PUF proteins contain an RNA-binding domain, known as the PUF domain or the Pumilio homology domain (PUM-HD), typically composed of eight tandem imperfect repeats of 36 amino acids plus conserved N and C-terminal flanking regions, aligned in tandem to form an extended curved arc-like molecule (Edwards, T. A. et al. (2001) Cell 105: 281-289). Target RNA binds to the inner concave surface of the protein, each of the eight repeats contacting a separate RNA base via three conserved amino acid residues positioned in the middle of the repeats (Wang, X. et al. (2002) Cell 110: 501-512). PUF proteins regulate RNA stability and translation by binding to specific sequences, such as the nanos response element (NRE), that are most often found in 3′ untranslated regions of target mRNAs (Gupta, supra). The PUM 1 NRE sequence is composed only of adenine, guanine or uracil.

The modular nature of the PUF-RNA interaction has been used to rationally engineer the binding specificity of PUF domains (Cheong, C. G. & Hall, T. M. (2006) PNAS 103: 13635-13639; Wang, X. et al (2002) Cell 110: 501-512). However, only the successful design of PUF domains with repeats that recognize adenine, guanine or uracil have been reported to date (Cheong, supra; Wang, supra). The specificity of individual repeats recognizing adenine, guanine or uracil were respectively switched by mutating only the positions that make contacts with the Watson-Crick edge of the base, providing engineered PUF domains capable of recognising endogenous RNA sequences composed of adenine, guanine or uracil (Wang, Y. Et al (2009) Nat Methods 6: 825-830; Tilsner, J. et al (2009) Plant 57: 758-770; Ozawa, T. et al (2007) Nat Methods 4: 413-419). Most of the known naturally occurring target sequences of PUM1, such as the NRE sequence, are composed of only adenine, guanine or uracil. However, despite focussed attempts to engineer PUF domains, the use of PUF domains designed with repeats that recognize these bases has remained substantially limited to sequences composed only of adenine, guanine or uracil.

There thus exists is a continued need for alternative methods for the specific regulation of gene expression and for agents for use therefor.

SUMMARY OF INVENTION

According to the invention there is provided a recombinant polypeptide comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base.

Further features of the invention provide for the PUF RNA-binding domain to comprise at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein

-   -   X₁ is selected from the group including glutamine (Q), valine         (V), methionine (M), proline (P), glutamic acid (E), and lysine         (K);     -   X₂ is selected from the group including histidine (H),         phenylalanine (F), tyrosine (Y), and asparagine (N);     -   X₃ is selected from the group including glycine (G) and alanine         (A);     -   X₄ is selected from the group including glycine (G), alanine         (A), serine (S), threonine (T) and cysteine (C);     -   X₅ is selected from the group including arginine (R), tyrosine         (Y), histidine (H), and asparagine (N);     -   X₆ is selected from the group including phenylalanine (F),         leucine (L), and valine (V);     -   X₇ is selected from the group including isoleucine (I), leucine         (L), and valine (V);     -   X₈ is arginine (R);     -   X₉ is selected from the group including leucine (L), lysine (K),         arginine (R), glutamine (Q), and histidine (H);     -   X₁₀ is selected from the group including lysine (K),         phenylalanine (F), alanine (A), cysteine (C), isoleucine (I),         valine (V), leucine (L), and methionine (M); and

X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V);

-   -   and wherein the RNA base-binding motif is operably capable of         specifically binding to a cytosine RNA base.

In one embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In a preferred embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is glycine (G); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In another preferred embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is alanine (A); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In another preferred embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is serine (S); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In another preferred embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is threonine (T); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In another preferred embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is cysteine (C); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

Alternatively, the PUF RNA-binding domain may comprise at least one RNA base-binding motif of the general formula QYGXYVIRHVL wherein X is an amino acid with a small or nucleophilic side chain.

In an embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula QYGXYVIRHVL wherein X is an amino acid selected from the group comprising glycine (G), alanine (A), serine (S), threonine (T), and cysteine (C).

Further features of the invention provide for the PUF RNA-binding domain to comprise a plurality of RNA base-binding motifs, at least one of which is capable of specifically binding to a cytosine RNA base; and further for the plurality of RNA base-binding motifs to comprise a first RNA base-binding motif capable of specifically binding to a cytosine RNA base and a second RNA base-binding motif capable of specifically binding to an RNA base comprising one of adenosine, guanine, or uracil, wherein the first and second RNA base-binding motifs are synergistically operable to specifically bind the RNA bases.

In another embodiment of the invention, the PUF RNA-binding domain comprises a plurality of consecutively ordered RNA base-binding motifs synergistically operable to bind a target RNA molecule with a target RNA sequence, each RNA base-binding motif capable of specifically binding to a cytosine, adenosine, guanine, or uracil RNA base, wherein the consecutive order of the RNA base-binding motifs corresponds with the consecutive order of the RNA bases in the target RNA sequence.

In a preferred embodiment of the invention, the plurality of consecutively ordered RNA base-binding motifs comprises at least one first RNA base-binding motif having a sequence of any one of SEQ ID NOS: 1-5 and at least one second RNA base-binding motif having a sequence of any one of SEQ ID NOS: 6-13.

Further features of the invention provide for the plurality of RNA base-binding motifs to comprise between 2 and 40 RNA base-binding motifs. Preferably, the plurality of RNA base-binding motifs comprise between 8 and 16 RNA base-binding motifs.

In a preferred embodiment of the invention, the recombinant polypeptide has a sequence of any one of SEQ ID NOS: 14-18 or any one of SEQ ID NOS: 24-30 or SEQ ID NO: 41.

In a more preferred embodiment of the invention, the recombinant polypeptide has a sequence of any one of SEQ ID NOS: 14-18.

In a further preferred embodiment of the invention, the recombinant polypeptide has a sequence of any one of SEQ ID NOS: 24-30.

In a further preferred embodiment of the invention, the recombinant polypeptide has a sequence of SEQ ID NO: 41.

In a further preferred embodiment of the invention, the amino acid spacers are derived from SEQ ID NO 39, or part thereof.

The invention also provides a fusion protein comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base, and an effector domain.

In an embodiment of the invention, the PUF RNA-binding domain of the fusion protein comprises a plurality of consecutively ordered RNA base-binding motifs synergistically operable to bind a target RNA molecule with a target RNA sequence, each RNA base-binding motif capable of specifically binding to a cytosine, adenosine, guanine, or uracil RNA base, wherein the consecutive order of the RNA base-binding motifs corresponds with the consecutive order of the RNA bases in the target RNA sequence.

In a preferred embodiment of the invention the plurality of consecutively ordered RNA base-binding motifs of the PUF RNA-binding domain of the fusion protein comprises at least one first RNA base-binding motif having a sequence of any one of SEQ ID NOS: 1-5 and at least one second RNA base-binding motif having a sequence of any one of SEQ ID NOS: 6-13.

The invention further provides for an isolated nucleic acid encoding the recombinant polypeptide or the fusion protein of the invention.

The isolated nucleic acid may have a sequence selected from the group comprising any one of SEQ ID NOS: 19-23, any one of SEQ ID NOS: 31-37, SEQ ID NO: 40, and a sequence at least 80% homologous to any one of SEQ ID NOS: 19-23 and 31-37.

In a preferred embodiment of the invention, the isolated nucleic acid has a sequence selected from the group comprising any one of SEQ ID NOS: 19-23.

In another preferred embodiment of the invention, the isolated nucleic acid has a sequence selected from the group comprising any one of SEQ ID NOS: 31-37.

In a further preferred embodiment of the invention, the isolated nucleic acid has a sequence of SEQ ID NO: 40.

The invention yet further provides a recombinant vector comprising nucleic acid encoding the recombinant polypeptide or the fusion protein of the invention.

The invention extends to host cells comprising nucleic acid encoding the recombinant polypeptide or the fusion protein of the invention; and for the nucleic acid of the host cell to have a sequence selected from the group comprising any one of SEQ ID NOS: 19-23, any one of SEQ ID NOS: 31-37, SEQ ID NO: 40, and a sequence at least 80% homologous to any one of SEQ ID NOS: 19-23 and 31-37.

The host cells may be selected from a wide variety of suitable host cells, including prokaryotic and eukaryotic cells, and may be selected according to the chosen expression system such as bacterial, yeast, insect or mammalian expression systems. Regulating sequences for gene expression in the various expression systems may be also be included in the host cells for. By way of illustration, the nucleic acids of the preferred embodiments of the invention are adapted for yeast host cells, preferably Saccharomyces cerevisiae, more preferably Saccharomyces cerevisiae YBZ-1. However, it is understood that the scope of the invention is not limited thereby.

The invention also provides for a composition comprising the recombinant polypeptide of the invention or the fusion protein of the invention or the isolated nucleic acid of the invention or the recombinant vector of the invention.

The invention extends to the use of an effective amount of the recombinant polypeptide of the invention or the fusion protein of the invention or the isolated nucleic acid of the invention or the recombinant vector of the invention in the manufacture of a medicament for regulating gene expression.

The invention further provides for a method of regulating expression of a gene in a cell, the method comprising the step of introducing into the cell a recombinant polypeptide comprising a PUF RNA-binding domain comprising a plurality of consecutively ordered RNA base-binding motifs synergistically operable to bind a target RNA molecule with a target RNA sequence, each RNA base-binding motif capable of specifically binding to a cytosine, adenosine, guanine, or uracil RNA base, wherein the consecutive order of the RNA base-binding motifs corresponds with the consecutive order of the RNA bases in the target RNA sequence; wherein the specific binding of the recombinant polypeptide to the target RNA alters the expression of the gene.

The invention further provides for a pharmaceutical composition comprising the recombinant polypeptide of the invention or the fusion protein of the invention or the isolated nucleic acid of the invention or the recombinant vector of the invention.

The invention also provides a system for regulating gene expression comprising

-   -   (a) a modular set of isolated nucleic acids encoding a plurality         of RNA base-binding motifs, the set including: at least one         isolated nucleic acid encoding a RNA base-binding motif capable         of specifically binding to a cytosine RNA base and at least one         isolated nucleic acid encoding a RNA base-binding motif capable         of specifically binding to an adenosine RNA base or a guanine         RNA base or a uracil RNA base;     -   (b) means for annealing the isolated nucleic acids of the         modular set in a desired sequence to produce an isolated nucleic         acid encoding an expressable recombinant polypeptide comprising         a PUF RNA-binding domain having a plurality of consecutively         ordered RNA base-binding motifs; and     -   (c) a target RNA molecule with a target RNA sequence, wherein         the consecutive order of the RNA base-binding motifs corresponds         with the consecutive order of the RNA bases in the target RNA         sequence

The invention extends to a kit for regulating gene expression comprising

-   -   (a) a modular set of isolated nucleic acids encoding a plurality         of RNA base-binding motifs, the set including: at least one         isolated nucleic acid encoding a RNA base-binding motif capable         of specifically binding to a cytosine RNA base and at least one         isolated nucleic acid encoding a RNA base-binding motif capable         of specifically binding to an adenosine RNA base or a guanine         RNA base or a uracil RNA base;     -   (b) means for annealing the isolated nucleic acids of the         modular set in a desired sequence to produce an isolated nucleic         acid encoding a recombinant polypeptide comprising a PUF         RNA-binding domain having a plurality of consecutively ordered         RNA base-binding motifs; and     -   (c) optionally, a target RNA molecule with a target RNA         sequence, wherein the consecutive order of the RNA base-binding         motifs corresponds with the consecutive order of the RNA bases         in the target RNA sequence

In a preferred embodiment of the invention, the plurality of consecutively ordered RNA base-binding motifs comprises at least one first RNA base-binding motif having a sequence of any one of SEQ ID NOS: 1-5 and at least one second RNA base-binding motif having a sequence of any one of SEQ ID NOS: 6-12.

Further features of the invention provide for the plurality of RNA base-binding motifs to comprise between 8 and 21 RNA base-binding motifs.

In a preferred embodiment of the invention, the recombinant polypeptide comprises eight operably linked RNA base-binding motifs comprising in consecutive order the amino acid sequences of SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, any one of SEQ ID NOS: 1-5, SEQ ID NO 11, and SEQ ID NO 12.

In another preferred embodiment of the invention, the recombinant polypeptide has a sequence of any one of SEQ ID NOS: 13-17.

In a further preferred embodiment of the invention, the amino acid spacers are derived from SEQ ID NO 23 or SEQ ID NO 24, or part thereof.

BRIEF DESCRIPTION OF THE DRAWINGS

Further features of the present invention are more fully described in the following description of several non-limiting embodiments thereof. This description is included solely for the purposes of exemplifying the present invention. It should not be understood as a restriction on the broad summary, disclosure or description of the invention as set out above. The description will be made with reference to the accompanying drawings in which:

FIG. 1 shows (a) a ribbon representation of the tertiary protein structure of the human PUM1 PUF domain in complex with RNA, (b) a schematic representation of the recognition of RNA bases in the NRE RNA by the PUF repeats of PUM1, and (c) a close up view of the side-chain interactions of adenine (upper panel), uracil (middle panel) and guanine (lower panel) of the RNA target and the human PUM1 PUF domain shown in FIG. 1 illustrating how individual repeats bind adenine, guanine and uracil;

FIG. 2 is a diagrammatic representation and alignment of the amino acid sequences comprising recombinant PUF domain repeat 6 of embodiments of the invention, as compared with PUF domain repeats 5 and 7. The recombinant PUF domain repeat 6 of the preferred embodiments of the invention has amino acids with small or nucleophilic side chains such as glycine (GR), alanine (AR), serine (SR), threonine (TR) and cysteine (CR) at position 12 (left box) together with arginine at position 16 (right box). The residues at positions 12 and 16 were randomised and combinations that could recognize cytosine were selected from the library using the yeast three-hybrid system;

FIG. 3 is a photograph of a plate yeast three-hybrid growth assay of selected Saccharomyces cerevisiae YBZ-1 transformants variously co-expressing the recombinant proteins of the invention and RNA expression plasmids. Growth of the selected clones on selective SC media lacking leucine, uracil, and histidine, supplemented with 0.5 mM 3-amino triazole was indicative of his3 reporter gene activation and therefore specific RNA-recombinant protein interaction;

FIG. 4 is a series of bar graphs showing of selected Saccharomyces cerevisiae YBZ-1 transformants variously co-expressing the recombinant proteins of the invention and RNA expression plasmids. The ratio of luminescence to cell density (RLU×10⁵/OD₆₀₀) was plotted against the identity of the base at position 3 of the target RNA sequence as quantified using β-galactosidase assays for the detection of lacZ reporter gene activation. Increased luminescence was indicative of specific RNA-recombinant protein interaction;

FIG. 5 is a scan of an SDS PAGE gel showing resolved samples of the purified preparations of GR and CR;

FIG. 6 shows (a) scans of 10% polyacrylamide gels on which RNA electrophoretic mobility shift assays using purified NQ (wildtype) (upper panel), CR (middle panel) and GR (lower panel) recombinant proteins and RNA oligonucleotides NRE and U3C were resolved, and (b) line graphs in which the percentage of RNA bound by varying concentrations of each protein is shown;

FIG. 7 shows the amino acid sequences of the GR, AR, SR, TR, and CR recombinant proteins of the invention in which the PUF domain repeats are internally aligned, showing amino acids at positions 12 and 16 of each PUF domain repeat as underlined, and engineered amino acids at positions 12 and 16 of each PUF domain repeat 6 as underlined and in bold; and

FIG. 8 shows (a) a photograph of a plate yeast three-hybrid growth assay (lower panel) of selected Saccharomyces cerevisiae YBZ-1 transformants co-expressing recombinant proteins of the invention having engineered PUF repeats in which residues 12 and 16 of the individual repeats were mutated to glycine and arginine respectively, and either the wildtype (NRE) RNA or a mutant NRE where the corresponding base was changed to cytosine. An accompanying bar graph showing corresponding β-galactosidase assays is shown in the upper panel. The engineered PUF repeats were introduced into each of the eight positions of the PUF RNA-base binding domain. Growth of the selected clones on selective SC media lacking leucine, uracil, and histidine, supplemented with 0.5 mM 3-amino triazole was indicative of his3 reporter gene activation and therefore specific RNA-recombinant protein interaction. The engineered PUF domain is able to bind to RNA targets that are located within substantially double stranded RNA structures as shown by (b) a photograph of a plate yeast three-hybrid growth assay of selected Saccharomyces cerevisiae YBZ-1 transformants co-expressing recombinant proteins of the invention engineered to target RNA with stem structures, as illustrated by the accompanying bar graph showing corresponding β-galactosidase assays (middle panel) and the diagrammatic representations of the target RNA with stem structures (upper panel).

FIG. 9 shows a PUF domain consisting of 16 RNA-binding repeats. The structure of the engineered 16 repeat PUF and its cognate RNA target are shown schematically in the left panel. A photograph of a plate yeast three-hybrid growth assay is shown on the lower panel demonstrating selected Saccharomyces cerevisiae YBZ-1 transformant survival on media with 0.5 mM 3-aminotriazole and lacking histidine. A bar graph showing the results of β-galactosidase assays used to determine the interaction of PUF domains and their RNA targets is shown in the upper panel. The NREz×2 mut1 RNA has the UGU triplet of the newly added target region mutated to CCC, and the NRE×2 mutt RNA has the UGU triplet of the native NRE mutated to CCC. The wild type PUF protein is shown to bind just as well to NRE×2 mut1 as it does to NREx2.

SEQ ID NO 1 is the amino acid sequence of the GR repeat of Example 1;

SEQ ID NO 2 is the amino acid sequence of the AR repeat of Example 1;

SEQ ID NO 3 is the amino acid sequence of the SR repeat of Example 1;

SEQ ID NO 4 is the amino acid sequence of the TR repeat of Example 1;

SEQ ID NO 5 is the amino acid sequence of the CR repeat of Example 1;

SEQ ID NO 6 is the amino acid sequence of repeat 1 of wild type human PUM1;

SEQ ID NO 7 is the amino acid sequence of repeat 2 of wild type human PUM1;

SEQ ID NO 8 is the amino acid sequence of repeat 3 of wild type human PUM1;

SEQ ID NO 9 is the amino acid sequence of repeat 4 of wild type human PUM1;

SEQ ID NO 10 is the amino acid sequence of repeat 5 of wild type human PUM1;

SEQ ID NO 11 is the amino acid sequence of repeat 6 of wild type human PUM1;

SEQ ID NO 12 is the amino acid sequence of repeat 7 of wild type human PUM1;

SEQ ID NO 13 is the amino acid sequence of repeat 8 of wild type human PUM1;

SEQ ID NO 14 is the amino acid sequence of the GR protein of Example 1 and of the GR protein-repeat 6 of Example 2;

SEQ ID NO 15 is the amino acid sequence of the AR protein of Example 1;

SEQ ID NO 16 is the amino acid sequence of the SR protein of Example 1;

SEQ ID NO 17 is the amino acid sequence of the TR protein of Example 1;

SEQ ID NO 18 is the amino acid sequence of the CR protein of Example 1;

SEQ ID NO 19 is the DNA sequence encoding the GR protein of Example 1 and the GR protein-repeat 6 of Example 2;

SEQ ID NO 20 is the DNA sequence encoding the AR protein of Example 1;

SEQ ID NO 21 is the DNA sequence encoding the SR protein of Example 1;

SEQ ID NO 22 is the DNA sequence encoding the TR protein of Example 1;

SEQ ID NO 23 is the DNA sequence encoding the CR protein of Example 1;

SEQ ID NO 24 is the amino acid sequence of the GR protein—repeat 1 of Example 2;

SEQ ID NO 25 is the amino acid sequence of the GR protein—repeat 2 of Example 2;

SEQ ID NO 26 is the amino acid sequence of the GR protein—repeat 3 of Example 2;

SEQ ID NO 27 is the amino acid sequence of the GR protein—repeat 4 of Example 2;

SEQ ID NO 28 is the amino acid sequence of the GR protein—repeat 5 of Example 2;

SEQ ID NO 29 is the amino acid sequence of the GR protein—repeat 7 of Example 2;

SEQ ID NO 30 is the amino acid sequence of the GR protein—repeat 8 of Example 2;

SEQ ID NO 31 is the DNA sequence encoding the GR protein—repeat 1 of Example 2;

SEQ ID NO 32 is the DNA sequence encoding the GR protein—repeat 2 of Example 2;

SEQ ID NO 33 is the DNA sequence encoding the GR protein—repeat 3 of Example 2;

SEQ ID NO 34 is the DNA sequence encoding the GR protein—repeat 4 of Example 2;

SEQ ID NO 35 is the DNA sequence encoding the GR protein—repeat 5 of Example 2;

SEQ ID NO 36 is the DNA sequence encoding the GR protein—repeat 7 of Example 2;

SEQ ID NO 37 is the DNA sequence encoding the GR protein—repeat 8 of Example 2;

SEQ ID NO 38 is the cDNA sequence from human PUM1, NM_(—)014676, that encodes the Puf domain; and

SEQ ID NO 39 is the amino acid sequence of the Puf domain from human PUM1, amino acids 828 to 1176.

SEQ ID NO 40 is the DNA sequence encoding PUF×2, the 16 repeat PUF protein, of Example 3;

SEQ ID NO 41 is the amino acid sequence of PUF×2, the 16 repeat PUF protein;

SEQ ID NO 42 is the RNA sequence to which the NRE primer corresponds;

SEQ ID NO 43 is the RNA sequence to which the NREU3A primer corresponds;

SEQ ID NO 44 is the RNA sequence to which the NREU3C primer corresponds;

SEQ ID NO 45 is the RNA sequence to which the NREU3G primer corresponds;

SEQ ID NO 46 is the RNA sequence to which the NRE (FI) primer corresponds;

SEQ ID NO 47 is the RNA sequence to which the NREU3C (FI) primer corresponds;

SEQ ID NO 48 is the RNA sequence to which the NREU1C primer corresponds;

SEQ ID NO 49 is the RNA sequence to which the NREG2C primer corresponds;

SEQ ID NO 50 is the RNA sequence to which the NREA4C primer corresponds;

SEQ ID NO 51 is the RNA sequence to which the NREU5C primer corresponds;

SEQ ID NO 52 is the RNA sequence to which the NREA6C primer corresponds;

SEQ ID NO 53 is the RNA sequence to which the NREU7C primer corresponds;

SEQ ID NO 54 is the RNA sequence to which the NREA8C primer corresponds;

SEQ ID NO 55 is the RNA sequence to which the NREstem5 primer corresponds;

SEQ ID NO 56 is the RNA sequence to which the NREstem6 primer corresponds;

SEQ ID NO 57 is the RNA sequence to which the NREstem7 primer corresponds;

SEQ ID NO 58 is the RNA sequence to which the NREstem8 primer corresponds;

SEQ ID NO 59 is the RNA sequence to which the NREx2 primer corresponds;

SEQ ID NO 60 is the RNA sequence to which the NREx2mut1 primer corresponds; and

SEQ ID NO 61 is the RNA sequence to which the NREx2mut2 primer corresponds.

DESCRIPTION OF EMBODIMENTS

Throughout this specification, unless the context requires otherwise, the word “comprise” or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers

General

Standard techniques may be used for recombinant DNA molecule, and protein production, as well as for tissue culture and cell transformation. Protein purification techniques are typically performed according to the manufacturer's specifications or as commonly accomplished in the art using conventional procedures such as those set forth in Sambrook et al. (Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. [1989]), or as described herein. Unless specific definitions are provided, the nomenclature used in connection with, and the laboratory procedures and techniques of molecular biology and biochemistry described herein, are those well known and commonly used in the art.

Throughout the specification, unless the context requires otherwise, the word “comprise” or variations such as “comprises” or “comprising”, will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.

Each document, reference, patent application or patent cited in this text is expressly incorporated herein in their entirely by reference, which means that it should be read and considered by the reader as part of this text. That the document, reference, patent application, or patent cited in this text is not repeated in this text is merely for reasons of conciseness.

Reference to cited material or information contained in the text should not be understood as a concession that the material or information was part of the common general knowledge or was known in Australia or any other country.

Those skilled in the art will appreciate that the invention described herein is susceptible to variations and modifications other than those specifically described. It is to be understood that the invention includes all such variation and modifications. The invention also includes all of the steps, features, compositions and compounds referred to or indicated in the specification, individually or collectively and any and all combinations or any two or more of the steps or features.

The present invention is not to be limited in scope by the specific embodiments described herein, which are intended for the purpose of exemplification only. Functionally equivalent products, compositions and methods are clearly within the scope of the invention as described herein.

DETAILED DESCRIPTION OF THE EMBODIMENTS OF THE INVENTION

A recombinant polypeptide is provided comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base.

The PUF RNA-binding domain of the recombinant polypeptide may comprise at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In an embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is valine (V); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is methionine (M); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is proline (P); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamic acid (E); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is histidine (H); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is phenylalanine (F); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is tyrosine (Y); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is glycine (G); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₃ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is glycine (G); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is alanine (A); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is serine (S); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is threonine (T); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is arginine (R); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is tyrosine (Y); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is histidine (H); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is phenylalanine (F); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is leucine (L); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is leucine (L); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is leucine (L); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is lysine (K); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is arginine (R); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is glutamine (Q); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is lysine (K); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is phenylalanine (F); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is alanine (A); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is cysteine (C); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is isoleucine (I); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is valine (V); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is leucine (L); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is leucine (L); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is phenylalanine (F); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is isoleucine (I); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

In another embodiment of the invention, the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X₈ is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.

Preferably, the PUF RNA-binding domain of the recombinant polypeptide comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₃ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In a more preferred embodiment, the PUF RNA-binding domain of the recombinant polypeptide comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is glycine (G); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In a more preferred embodiment, the PUF RNA-binding domain of the recombinant polypeptide comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is alanine (A); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In a more preferred embodiment, the PUF RNA-binding domain of the recombinant polypeptide comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is serine (S); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In a more preferred embodiment, the PUF RNA-binding domain of the recombinant polypeptide comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is threonine (T); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

In a more preferred embodiment, the PUF RNA-binding domain of the recombinant polypeptide comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is glutamine (Q); X₂ is tyrosine (Y); X₃ is glycine (G); X₄ is cysteine (C); X₅ is tyrosine (Y); X₆ is valine (V); X₇ is isoleucine (I); X₈ is arginine (R); X₉ is histidine (H); X₁₀ is valine (V); and X₁₁ is leucine (L).

The PUF RNA-binding domain may comprise at least one RNA base-binding motif of the general formula QYGXYVIRHVL wherein X is an amino acid with a small or nucleophilic side chain; and for the PUF RNA-binding domain to comprise at least one RNA base-binding motif of the general formula QYGXYVIRHVL wherein X is an amino acid selected from the group comprising glycine (G), alanine (A), serine (S), threonine (T), and cysteine (C).

The cytosine may be provided in the form of a target RNA sequence in a target RNA molecule; the target RNA molecule may be any one of messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), non-coding RNA, and RNA interference molecules such as short interfering RNA (sRNA) and micro RNA (miRNA).

The term “capable of specifically binding to a cytosine RNA base” as used herein refers to the ability of the PUF RNA-binding domain of the present invention to selectively recognize, interact with, and bind a cytosine RNA base relative to an adenosine, guanine, or uracil RNA base.

The PUF RNA-binding domain may comprise a plurality of RNA base-binding motifs, at least one of which is capable of specifically binding to a cytosine RNA base; and further for the plurality of RNA base-binding motifs to comprise a first RNA base-binding motif capable of specifically binding to a cytosine RNA base and a second RNA base-binding motif capable of specifically binding to an RNA base comprising one of adenosine, guanine, or uracil, wherein the first and second RNA base-binding motifs are synergistically operable to specifically bind the RNA bases.

Typically, the PUF RNA-binding domain comprises a plurality of consecutively ordered RNA base-binding motifs synergistically operable to bind a target RNA molecule with a target RNA sequence, each RNA base-binding motif capable of specifically binding to a cytosine, adenosine, guanine, or uracil RNA base, wherein the consecutive order of the RNA base-binding motifs corresponds with the consecutive order of the RNA bases in the target RNA sequence.

The target RNA molecule may be mRNA encoding a reporter protein including, but not limited to his3, β-galatosidase, GFP, RFP, YFP, luciferase, β-glucuronidase, and alkaline phosphatase.

Preferably, the plurality of consecutively ordered RNA base-binding motifs comprises at least one first RNA base-binding motif having a sequence of any one of SEQ ID NOS: 1-5 and at least one second RNA base-binding motif having a sequence of any one of SEQ ID NOS: 6-13.

The plurality of RNA base-binding motifs may comprise between 2 and 40 RNA base-binding motifs. Preferably, the plurality of RNA base-binding motifs to comprise between 8 and 16 RNA base-binding motifs.

The PUF RNA-binding domain may comprise a plurality of RNA base-binding motifs operably linked via amino acid spacers; for such amino acid spacers to include such as those typically used by persons skilled in the art; and further for the amino acid spacers to be derived, wholly or in part, from any one of human PUM1 (AF315592), Human PUM2 (AF315591), mouse Pum1 (AF321909), Mouse Pum 2

(AF315590), Xenopus XPum1 (AAL14121) and Xenopus XPum2 (BAB20864) Trypanosoma Puf1, Ce Puf-7 (B0273.2), Sc Puf6p (YDR496c), Ce Puf-11 (Y73B6BL.10), Sc Puf1p (JSN1), Sc Puf2p (YPR042C), Ce FBF-2, Ce FBF-1, Ce Puf-3 (Y45F10A.2), Ce Puf-4 (M4.2), Ce Puf-5 (F54C9.8), Ce Puf-6 (F18A11.1), Ce Puf-10 (Y48G1BL.3), Sc Puf5p (MPT5), Dictyost. PufA, DrPum, Anapheles Pum, XPum1, Pum1, PUM1, zfPum1, XPum2, Pum2, PUM2, zfPum2, Ce Puf-8 (C30G12.7), Ce Puf-9 (W06B11.2), AraF14P13.4, AraQ9ZW07, AraF16P2.43, AraF16P2.48, AraAT4g25880, AraF14M19.160, Sc Puf3p (YII013C), Plasmodium Pum, Sc Puf4p (YGL023), AraF14D7.5, AraT15F16.9, zebrafish zfPum1 (BQ133093) and zebrafish zfPum2 (AI558582).

The fusion protein of the invention may comprise at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base, and an effector domain.

The effector domain of the fusion protein on the invention may be any domain capable of interacting with RNA, whether transiently or irreversibly, directly or indirectly, including but not limited to an effector domain selected from the group comprising; Endonucleases (for example RNase III, the CRR22 DYW domain, Dicer, and PIN (PiIT N-terminus) domains from proteins such as SMG5 and SMG6); proteins and protein domains responsible for stimulating RNA cleavage (for example CPSF, CstF, CFIm and CFIIm); Exonucleases (for example XRN-1 or Exonuclease T); Deadenylases (for example HNT3); proteins and protein domains responsible for nonsense mediated RNA decay (for example UPF1, UPF2, UPF3, UPF3b, RNP S1, Y14, DEK, REF2, and SRm160); proteins and protein domains responsible for stabilizing RNA (for example PABP); proteins and protein domains responsible for repressing translation (for example Agog and Ago4); proteins and protein domains responsible for stimulating translation (for example Staufen); proteins and protein domains responsible for polyadenylation of RNA (for example PAP1, GLD-2, and Star-PAP); proteins and protein domains responsible for polyuridinylation of RNA (for example CID1 and terminal uridylate transferase); proteins and protein domains responsible for RNA localization (for example from IMP1, ZBP1, She2p, She3p, and Bicaudal-D); proteins and protein domains responsible for nuclear retention of RNA (for example Rrp6); proteins and protein domains responsible for nuclear export of RNA (for example TAP, NXF1, THO, TREX, REF, and Aly); proteins and protein domains responsible for repression of RNA splicing (for example PTB, Sam68, and hnRNP A1); proteins and protein domains responsible for stimulation of RNA splicing (for example Serine/Arginine-rich (SR) domains); proteins and protein domains responsible for reducing the efficiency of transcription (for example FUS (TLS)); and proteins and protein domains responsible for stimulating transcription (for example CDK7 and HIV Tat). Alternatively, the effector domain may be selected from the group comprising Endonucleases; proteins and protein domains capable of stimulating RNA cleavage; Exonucleases; Deadenylases; proteins and protein domains having nonsense mediated RNA decay activity; proteins and protein domains capable of stabilizing RNA; proteins and protein domains capable of repressing translation; proteins and protein domains capable of stimulating translation; proteins and protein domains capable of polyadenylation of RNA; proteins and protein domains capable of polyuridinylation of RNA; proteins and protein domains having RNA localization activity; proteins and protein domains capable of nuclear retention of RNA; proteins and protein domains having RNA nuclear export activity; proteins and protein domains capable of repression of RNA splicing; proteins and protein domains capable of stimulation of RNA splicing; proteins and protein domains capable of reducing the efficiency of transcription; and proteins and protein domains capable of stimulating transcription.

The PUF RNA-binding domain of the fusion protein of the invention may comprise at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein

-   -   X₁ is selected from the group including glutamine (Q), valine         (V), methionine (M), proline (P), glutamic acid (E), and lysine         (K);     -   X₂ is selected from the group including histidine (H),         phenylalanine (F), tyrosine (Y), and asparagine (N);     -   X₃ is selected from the group including glycine (G) and alanine         (A);     -   X₄ is selected from the group including glycine (G), alanine         (A), serine (S), threonine (T) and cysteine (C);     -   X₅ is selected from the group including arginine (R), tyrosine         (Y), histidine (H), and asparagine (N);     -   X₆ is selected from the group including phenylalanine (F),         leucine (L), and valine (V);     -   X₇ is selected from the group including isoleucine (I), leucine         (L), and valine (V);     -   X₈ is arginine (R);     -   X₉ is selected from the group including leucine (L), lysine (K),         arginine (R), glutamine (Q), and histidine (H);     -   X₁₀ is selected from the group including lysine (K),         phenylalanine (F), alanine (A), cysteine (C), isoleucine (I),         valine (V), leucine (L), and methionine (M); and     -   X₁₁ is selected from the group including leucine (L),         phenylalanine (F), isoleucine (I), and valine (V);     -   and wherein the RNA base-binding motif is operably capable of         specifically binding to a cytosine RNA base.

The PUF RNA-binding domain of the fusion protein of the invention may comprise at least one RNA base-binding motif of the general formula QYGXYVIRHVL wherein X is an amino acid with a small or nucleophilic side chain; and for the PUF RNA-binding domain to comprise at least one RNA base-binding motif of the general formula QYGXYVIRHVL wherein X is an amino acid selected from the group comprising glycine (G), alanine (A), serine (S), threonine (T), and cysteine (C).

The cytosine to which the PUF RNA-binding domain of the fusion protein of the invention is operably capable of specifically binding may be provided in the form of a target RNA sequence in a target RNA molecule; the target RNA molecule may be any one of messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), non-coding RNA, and RNA interference molecules such as short interfering RNA (sRNA) and micro RNA (miRNA).

The PUF RNA-binding domain of the fusion protein of the invention may comprise a plurality of RNA base-binding motifs, at least one of which is capable of specifically binding to a cytosine RNA base; and further for the plurality of RNA base-binding motifs to comprise a first RNA base-binding motif capable of specifically binding to a cytosine RNA base and a second RNA base-binding motif capable of specifically binding to an RNA base comprising one of adenosine, guanine, or uracil, wherein the first and second RNA base-binding motifs are synergistically operable to specifically bind the RNA bases.

Preferably, the PUF RNA-binding domain of the fusion protein of the invention comprises a plurality of consecutively ordered RNA base-binding motifs synergistically operable to bind a target RNA molecule with a target RNA sequence, each RNA base-binding motif capable of specifically binding to a cytosine, adenosine, guanine, or uracil RNA base, wherein the consecutive order of the RNA base-binding motifs corresponds with the consecutive order of the RNA bases in the target RNA sequence.

The target RNA molecule to which the fusion protein of the invention is capable of binding may be mRNA encoding a reporter protein including, but not limited to his3, β-galatosidase, GFP, RFP, YFP, luciferase, β-glucuronidase, and alkaline phosphatase.

Preferably, the plurality of consecutively ordered RNA base-binding motifs comprises at least one first RNA base-binding motif having a sequence of any one of SEQ ID NOS: 1-5 and at least one second RNA base-binding motif having a sequence of any one of SEQ ID NOS: 6-13.

Yet further features provide for the PUF RNA-binding domain to comprise a plurality of RNA base-binding motifs operably linked via amino acid spacers; for such amino acid spacers to include such as those typically used by persons skilled in the art; and further for the amino acid spacers to be derived, wholly or in part, from any one of human PUM1 (AF315592), Human PUM2 (AF315591), mouse Pum1 (AF321909),

Mouse Pum 2 (AF315590), Xenopus XPum1 (AAL14121) and Xenopus XPum2 (BAB20864) Trypanosoma Puf1, Ce Puf-7 (B0273.2), Sc Puf6p (YDR496c), Ce Puf-11 (Y73B6BL.10), Sc Puf1p (JSN1), Sc Puf2p (YPRO42C), Ce FBF-2, Ce FBF-1, Ce Puf-3 (Y45F10A.2), Ce Puf-4 (M4.2), Ce Puf-5 (F54C9.8), Ce Puf-6 (F18A11.1), Ce Puf-10 (Y48G1BL.3), Sc Puf5p (MPT5), Dictyost. PufA, DrPum, Anapheles Pum, XPum1,

Pum1, PUM1, zfPum1, XPum2, Pum2, PUM2, zfPum2, Ce Puf-8 (C30G12.7), Ce Puf-9 (W06B11.2), AraF14P13.4, AraQ9ZW07, AraF16P2.43, AraF16P2.48, AraAT4g25880, AraF14M19.160, Sc Puf3p (YII013C), Plasmodium Pum, Sc Puf4p (YGL023), AraF14D7.5, AraT15F16.9, zebrafish zfPum1 (BQ133093) and zebrafish zfPum2 (AI558582).

The PUF RNA-binding domain and the effector domain of the fusion protein of the invention may be operably linked via a peptide spacer.

Due to the degeneracy of the DNA code, it will be well understood to one of ordinary skill in the art that substitution of nucleotides may be made without changing the amino acid sequence of the polypeptide. Therefore, the invention includes any nucleic acid sequence for a recombinant polypeptide comprising a PUF RNA-binding domain capable of specifically binding to a cytosine RNA base. Moreover, it is understood in the art that for a given protein's amino acid sequence, substitution of certain amino acids in the sequence can be made without significant effect on the function of the peptide. Such substitutions are known in the art as “conservative substitutions.” The invention encompasses a recombinant polypeptide comprising a PUF RNA-binding domain that contains conservative substitutions, wherein the function of the recombinant polypeptide in the specific binding of a cytosine RNA base is not altered. Generally, the identity of such a mutant recombinant polypeptide comprising a PUF RNA-binding domain will be at least 40% identical to any one of SEQ ID NOS 1-5. More preferably, the mutant recombinant polypeptide comprising a PUF RNA-binding domain will be at least 45%; at least 50%; at least 55%; at least 60%; at least 65%; at least 70%; at least 75%; at least 80%; at least 85%; at least 90%; at least 95%; or at least 97% identical; to any one of SEQ ID NOS 1-5. Most preferably, the mutant recombinant polypeptide comprising a PUF RNA-binding domain will be at least 99% identical to any one of SEQ ID NOS 1-5.

The invention further provides for an isolated nucleic acid encoding the recombinant polypeptide or the fusion protein of the invention.

The isolated nucleic acid of the invention may have a sequence of any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40.

The isolated nucleic acid encoding the recombinant polypeptide or the fusion protein of the invention may be at least 40% identical; at least 45%; at least 50%; at least 55%; at least 60%; at least 65%; at least 70%; at least 75%; at least 80%; at least 85%; at least 90%; at least 95%; or at least 97% identical; to of any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40. Most preferably, the isolated nucleic acid encoding the recombinant polypeptide or the fusion protein is at least 99% identical to of any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40.

The isolated nucleic acid may have a sequence selected from the group comprising any one of SEQ ID NOS: 19-23, any one of SEQ ID NOS: 31-37, or SEQ ID NO: 40, and a sequence at least 80% homologous to any one of SEQ ID NOS: 19-23 and 31-37 or SEQ ID NO: 40.

The recombinant vector of the invention may comprise nucleic acid encoding the recombinant polypeptide or the fusion protein of the invention.

The nucleic acid of the recombinant vector may have a sequence of any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40. The invention encompasses a recombinant vector comprising nucleic acid encoding the recombinant polypeptide or the fusion protein of the invention that is at least 40% identical to of any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40. Preferably, the nucleic acid of the recombinant vector will be at least 45%; at least 50%; at least 55%; at least 60%; at least 65%; at least 70%; at least 75%; at least 80%; at least 85%; at least 90%; at least 95%; or at least 97% identical; to of any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40. Most preferably, the nucleic acid of the recombinant vector will be at least 99% identical to any of any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40.

The host cells of the invention may comprise nucleic acid encoding the recombinant polypeptide or the fusion protein of the invention, that is at least 40%; at least 45%; at least 50%; at least 55%; at least 60%; at least 65%; at least 70%; at least 75%; at least 80%; at least 85%; at least 90%; at least 95%; or at least 97% identical to any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40. Most preferably, the nucleic acid of the host cell will be at least 99% identical to any one of SEQ ID NOS: 19-23 or any one of SEQ ID NOS: 31-37 or SEQ ID NO: 40.

Any suitable host cell may be used, including prokaryotic and eukaryotic cells and may be selected according to the chosen expression system such as bacterial, yeast, insect or mammalian expression systems. Regulating sequences for gene expression in the various expression systems may be selected accordingly. Typically, the host cell of the invention is a yeast, preferably Saccharomyces cerevisiae YBZ-1.

The recombinant polypeptide of the invention or the fusion protein of the invention may further comprise an operable signal sequence such as those known in the art, including but not limited to a nuclear localization signal (NLS), a mitochondrial targeting sequence (MTS) and a secretion signal. The isolated nucleic acid of the invention, the nucleic acid of the recombinant vector of the invention, and the nucleic acid of the host cell of the invention may encode an operable signal sequence such as those known in the art, including but not limited to a nuclear localization signal (NLS), a mitochondrial targeting sequence (MTS) and a secretion signal. The recombinant polypeptide of the invention or the fusion protein of the invention may further comprise a protein tag such as those known in the art, including but not limited to an intein tag, a maltose binding protein domain tag, a histidine tag, a FLAG-tag, a biotin tag, a strepavidin tag, a starch binding protein domain tag, a hemagglutinin tag, and a fluorescent protein tag.

The polypeptides and proteins of the present invention may be modified peptides, i.e. peptides, which may contain amino acids modified by addition of any chemical residue, such as phosphorylated or myristylated amino acids.

The pharmaceutical composition of the invention comprising the recombinant polypeptide of the invention or the fusion protein of the invention or the isolated nucleic acid of the invention or the recombinant vector of the invention.

The term “pharmaceutical composition” as used herein comprises the substances of the present invention and optionally one or more pharmaceutically acceptable carriers. The substances of the present invention may be formulated as pharmaceutically acceptable salts. Acceptable salts comprise acetate, methylester, HCl, sulfate, chloride and the like. The pharmaceutical compositions can be conveniently administered by any of the routes conventionally used for drug administration, for instance, orally, topically, parenterally or by inhalation. The substances may be administered in conventional dosage forms prepared by combining the drugs with standard pharmaceutical carriers according to conventional procedures. These procedures may involve mixing, granulating and compressing or dissolving the ingredients as appropriate to the desired preparation. It will be appreciated that the form and character of the pharmaceutically acceptable character or diluent is dictated by the amount of active ingredient with which it is to be combined, the route of administration and other well-known variables. The carrier(s) must be “acceptable” in the sense of being compatible with the other ingredients of the formulation and not deleterious to the recipient thereof. The pharmaceutical carrier employed may be, for example, either a solid or liquid. Exemplary of solid carriers are lactose, terra alba, sucrose, talc, gelatine, agar, pectin, acacia, magnesium stearate, stearic acid and the like. Exemplary of liquid carriers are phosphate buffered saline solution, syrup, oil such as peanut oil and olive oil, water, emulsions, various types of wetting agents, sterile solutions and the like. Similarly, the carrier or diluent may include time delay material well known to the art, such as glyceryl mono-stearate or glyceryl distearate alone or with a wax. The substance according to the present invention can be administered in various manners to achieve the desired effect. Said substance can be administered either alone or in the formulated as pharmaceutical preparations to the subject being treated either orally, topically, parenterally or by inhalation. Moreover, the substance can be administered in combination with other substances either in a common pharmaceutical composition or as separated pharmaceutical compositions. The diluent is selected so as not to affect the biological activity of the combination. Examples of such diluents are distilled water, physiological saline, Ringer's solutions, dextrose solution, and Hank's solution. In addition, the pharmaceutical composition or formulation may also include other carriers, adjuvants, or nontoxic, nontherapeutic, nonimmunogenic stabilizers and the like. A therapeutically effective dose refers to that amount of the substance according to the invention which ameliorate the symptoms or condition. Therapeutic efficacy and toxicity of such compounds can be determined by standard pharmaceutical procedures in cell cultures or experimental animals, e.g., ED50 (the dose therapeutically effective in 50% of the population) and LD50 (the dose lethal to 50% of the population). The dose ratio between therapeutic and toxic effects is the therapeutic index, and it can be expressed as the ratio, LD50/ED50. The dosage regimen will be determined by the attending physician and other clinical factors; preferably in accordance with any one of the methods described above. As is well known in the medical arts, dosages for any one patient depends upon many factors, including the patient's size, body surface area, age, the particular compound to be administered, sex, time and route of administration, general health, and other drugs being administered concurrently. Progress can be monitored by periodic assessment. Specific formulations of the substance according to the invention are prepared in a manner well known in the pharmaceutical art and usually comprise at least one active substance referred to herein above in admixture or otherwise associated with a pharmaceutically acceptable carrier or diluent thereof. For making those formulations the active substance(s) will usually be mixed with a carrier or diluted by a diluent, or enclosed or encapsulated in a capsule, sachet, cachet, paper or other suitable containers or vehicles. A carrier may be solid, semisolid, gel-based or liquid material, which serves as a vehicle, excipient or medium for the active ingredients. Said suitable carriers comprise those mentioned above and others well known in the art, see, e.g., Remington's Pharmaceutical Sciences, Mack Publishing Company, Easton, Pennsylvania. The formulations can be adapted to the mode of administration comprising the forms of tablets, capsules, suppositories, solutions, suspensions or the like. The dosing recommendations will be indicated in product labeling by allowing the prescriber to anticipate dose adjustments depending on the considered patient group, with information that avoids prescribing the wrong drug to the wrong patients at the wrong dose.

The PUF RNA-binding domain of the system of the invention, and of the kit of the invention, may comprise at least one RNA base-binding motif as described herein.

The PUF RNA-binding domain of the system of the invention, and of the kit of the invention, may comprise a plurality of RNA base-binding motifs operably linked via amino acid spacers; for such amino acid spacers to include such as those typically used by persons skilled in the art; and further for the amino acid spacers to be derived, wholly or in part, from any one of human PUM1 (AF315592), Human PUM2

(AF315591), mouse Pum1 (AF321909), Mouse Pum 2 (AF315590), Xenopus XPum1 (AAL14121) and Xenopus XPum2 (BAB20864) Trypanosoma Puf1, Ce Puf-7 (B0273.2), Sc Puf6p (YDR496c), Ce Puf-11 (Y73B6BL.10), Sc Puf1p (JSN1), Sc Puf2p (YPR042C), Ce FBF-2, Ce FBF-1, Ce Puf-3 (Y45F10A.2), Ce Puf-4 (M4.2), Ce Puf-5 (F54C9.8), Ce Puf-6 (F18A11.1), Ce Puf-10 (Y48G1BL.3), Sc Puf5p (MPT5), Dictyost. PufA, DrPum, Anapheles Pum, XPum1, Pum1, PUM1, zfPum1, XPum2, Pum2, PUM2, zfPum2, Ce Puf-8 (C30G12.7), Ce Puf-9 (W06B11.2), AraF14P13.4, AraQ9ZW07, AraF16P2.43, AraF16P2.48, AraAT4g25880, AraF14M19.160, Sc Puf3p (YII013C), Plasmodium Pum, Sc Puf4p (YGL023), AraF14D7.5, AraT15F16.9, zebrafish zfPum1 (BQ133093) and zebrafish zfPum2 (A1558582).

Specific

Briefly, the invention provides recombinant polypeptides derived from human PUM1 comprising a PUF RNA-binding domain with eight RNA base binding motifs or repeats which are each capable of binding to an RNA base. The recombinant polypeptides of the invention include the first 5 RNA base binding motifs or repeats of human PUM1 (SEQ ID NOS: 6-10) and the last two RNA base binding motifs or repeats (SEQ ID NOS: 11-12). RNA base binding motif or repeat 6 was recombinantly engineered to have the general formula QYGXYVIRHVL in which X is an amino acid with a small or nucleophilic side chain, such as glycine (G), alanine (A), serine (S), threonine (T), and cysteine (C). This recombinantly engineered RNA base binding motif or repeat 6 was surprisingly capable of specifically binding to a cytosine RNA base.

While recombinantly engineered RNA base binding motif or repeats have been prepared for specific binding to adenosine, guanine and uracil, no recombinantly engineered RNA base binding motif or repeat has previously been reported to be specifically capable of binding cytosine. It is thus intended that the modular arrangement of a PUF RNA-binding domain be exploited for the inventive design and preparation of recombinant polypeptides of the invention which are capable of binding to a target RNA molecule having any RNA sequence, simply by the appropriate arrangement of the RNA base binding motifs or repeats with respect to one another. While recombinant polypeptides having eight RNA base binding motifs are herein described, it is understood that the scope of the invention is not limited to the use of any specific number of repeats.

It is further intended that the recombinant RNA base-binding motifs capable of specifically binding cytosine as described herein be combined with other known RNA base-binding motifs capable of specifically binding to adenosine, guanine or uracil. Such a combination is expected to function synergistically to facilitate the specific binding of a PUF RNA-binding domain having an engineered consecutive order of RNA binding motifs to a target RNA molecule, the target RNA sequence of which corresponds with the engineered consecutive order of RNA binding motifs in the recombinant polypeptide. The preparation of such a recombinant polypeptide can be carried out by using recombinant methods known to those skilled in the art.

Effector domains may also be fused to the recombinant polypeptides using recombinant methods known to those skilled in the art to produce fusion proteins. The effector domain may be any domain capable of interacting with RNA, whether transiently or irreversibly, to effect events including but not limited to mRNA processing and transport, translation initiation or inhibition, and mRNA degradation.

It will be appreciated that the cytosine to which the recombinant polypeptide of the invention specifically binds may be present as part of a target RNA sequence in a target RNA molecule and that this RNA is not limited to that of messenger RNA (mRNA) but may be any one of transfer RNA (tRNA), ribosomal RNA (rRNA), non-coding RNA, and RNA interference molecules such as short interfering RNA (sRNA) and micro RNA (miRNA). Furthermore, the target RNA molecule may be an mRNA encoding a reporter protein such as his3, β-galatosidase, GFP, RFP, YFP, luciferase, β-glucuronidase, and alkaline phosphatase, where the invention is used as a research tool. The target RNA molecule may also be an endogenous mRNA transcript where the invention is used as a therapeutic agent. Such endogenous mRNA transcripts are understood to include those produced by infectious agents such as viruses and intracellular pathogens.

Reporter proteins may also be fused to the recombinant polypeptide where the invention is used as a research tool. Examples include his3, β-galatosidase, GFP, RFP, YFP, luciferase, β-glucuronidase, and alkaline phosphatase. Similarly, various tags for facilitating purification proteins of the invention and/or signal sequences such as a nuclear localization signals (NLSs), a mitochondrial targeting sequence (MTS) and secretion signals may be fused thereto, using recombinant methods known to those skilled in the art. Examples include intein tag, a maltose binding protein domain tag, a histidine tag, a FLAG-tag, a biotin tag, a strepavidin tag, a starch binding protein domain tag, a hemagglutinin tag, and a fluorescent protein tag. The amino acid sequence or peptide spacers between the RNA-base binding motifs may be derived from any PUF domain containing protein provided the resultant recombinant polypeptides are capable of operably binding RNA bases as described herein.

The recombinant polypeptides, fusion proteins, isolated nucleic acids, and recombinant vectors of the invention may be used as research tools, in the form of compositions as described herein, or as pharmaceutical compositions if combined with a pharmaceutically acceptable carrier or excipient such as those known in the art.

The invention is further intended to be used to regulate the expression of a specific gene, and to this end methods, systems, and kits are herein provided for the modular preparation of isolated nucleic acid encoding RNA base-binding motifs in a desired consecutive order which is capable of specifically binding cytosine, adenosine, guanine or uracil in the order in which they are included in a corresponding RNA target molecule. Once such nucleic acid has been recombinantly prepared, it may be inserted into a recombinant vector, such as pGAD-RC, using methods known in the art, and introduced into a cell and expressed. The effect on gene expression may be monitored using techniques known in the art, including but not limited to yeast three-hybrid growth assays using the his3 reporter gene system, and β-galactosidase assays.

EXAMPLE 1 Introduction

Referring to FIGS. 1 and 2, PUF (Drosophila Pumilio and Caenorhabitis elegans FBF homology) domains are typically composed of eight 36 amino acid repeats repeat binding to a single nucleotide in its extended RNA target via hydrogen bonding or van der Waals contacts between amino acids at positions 12 and 16 and the Watson-Crick edge. The amino acid at position 13 makes a stacking interaction.

Although PUF domains with repeats that recognize adenine, guanine or uracil have been reported (Cheong, C. G. & Hall, T. M. (2006) PNAS 103: 13635-13639; Wang, X./ et al (2002) Cell 110: 501-512), the use of these PUF domains has been substantially hampered by the lack of residues known to specifically recognise cytosine.

Materials and Methods Plasmids

To produce a Gal4p activation domain fused to a PUF domain, a synthetic gene encoding amino acids 828 to 1176 of the human PUM1 protein (GenBank accession no. NP_(—)001018494, GENEART) was subcloned into pGAD-RC (pGAD-RC: Ito, T. et al (2000) 97: 1143-1147). This plasmid was used as a template for library construction by enzymatic inverse PCR (enzymatic inverse PCR: Rackham, O. & Chin, J. W. (2005) Nat Chem Biol 1: 159-166) using primers where the codons corresponding to amino acids 1043 and 1047 were encoded by mixtures of trimer phosphoramidites encoding all 20 amino acids (GeneWorks). Individual Puf domain mutants were also made by enzymatic inverse PCR. RNA expression plasmids were made by altering the multiple cloning site of pIIIA/MS2-2 (pIIIA/MS2-2: Stumpf, C. R. et al (2008) Methods Enzymol 449: 295-315) according to the method of Cassiday and Maher (Cassiday, L. A. & Maher, L. J. (2001) Biochemistry 40: 2433-2438) and subcloning pairs of annealed oligonucleotides corresponding to the following RNA sequences (PUF recognition sequences in bold, site specific mutations underlined):

NRE: (SEQ ID NO: 42) 5′-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3′; NREU3A: (SEQ ID NO: 43) 5′-CCGGCUAGCAAUUG A AUAUAUUAAUUUAAUAAAGCAUG-3′; NREU3C: (SEQ ID NO: 44) 5′-CCGGCUAGCAAUUG C AUAUAUUAAUUUAAUAAAGCAUG-3′; NREU3G: (SEQ ID NO: 45) 5′-CCGGCUAGCAAUUG G AUAUAUUAAUUUAAUAAAGCAUG-3′.

PUF Library Selections

Saccharomyces cerevisiae YBZ-1 cells (MATa, ura3-52, leu2-3, 112, his3-200, trp 1-1, ade2, LYS2 :: (LexAop)-HIS3, ura3 :: (lexA-op)-lacZ, LexA-MS2 coat

(N55K)) (Hook, B. et al. (2005) RNA 11: 227-233) containing the NREU3C RNA expression plasmid were transformed with the PUF domain library in pGAD-RC using the LiAc method according to the method of Gietz and Woods (Gietz, R. D. & Woods, R. A. (2002) Methods Enzymol 350: 87-96) yielding 6×10⁵ primary transformants. Cells were amplified by overnight growth in SC media lacking leucine and uracil, washed in TE and 1×10⁷ CFU were plated on SC agar lacking leucine, uracil and histidine, supplemented with 0.5 mM 3-amino triazole. Colonies were picked after three days and the plasmids were isolated, transformed into Escherichia coli DH10B, screened by PCR to identify the PUF encoding plasmid which was sequenced and transformed into YBZ-1 to analyze the specificity of the mutant PUF domains, as described below.

Yeast Three-Hybrid Growth Assays

YBZ-1 transformants containing PUF domain and RNA expression plasmids were grown overnight in SC media lacking leucine and uracil, washed in SC media without amino acids, diluted to OD₆₀₀ of 0.1 and replica spotted onto SC media lacking leucine and uracil (to test for cell health and plasmid maintenance) and SC agar lacking leucine, uracil and histidine, supplemented with 0.5 mM 3-amino triazole (to test for RNA-protein interactions).

β-Galactosidase Assays

YBZ-1 transformants containing PUF domain and RNA expression plasmids were grown overnight in SC media lacking leucine and uracil, diluted to OD₆₀₀ of 0.1 and mixed with an equal volume of Beta-Glo reagent (Promega), incubated for 1 h at room temperature and luminescence was detected using a FLUOstar OPTIMA (BMB Labtech).

Purification of PUF Proteins

PUF domains were subcloned into pTYB3 and expressed as a fusion to an intein and chitin-binding domain in Escherichia coli ER2566 cells (New England Biolabs). Cells were lyzed by sonication in 20 mM sodium phosphate (pH 8.0), 1 M NaCl, and 0.1 mM PMSF. Lysates were clarified by centrifugation and incubated for 40 min with chitin beads (New England Biolabs). Beads were washed twice with 20 mM sodium phosphate (pH 8.0), 1 M NaCl, and 0.1 mM PMSF, once with 20 mM sodium phosphate (pH 8.0), 0.5 M NaCl, and 0.1 mM PMSF, and once with 20 mM sodium phosphate (pH 8.0), 0.15 M NaCl, and 0.1 mM PMSF. DTT was added to the beads to 50 mM final concentration and the tube was purged with nitrogen gas before incubation at room temperature with gentle rocking for three days. Cleaved PUF domain protein, free from the intein and chitin-binding domain was collected, transferred into 10 mM Tris-HCl (pH 7.4), 150 mM NaCl, 5 mM β-mercaptoethanol and further purified by an ÄKTA-Explorer system (GE) using a Superdex 200 10/300 column (GE) with a total bed volume of 120 ml. Pure fractions were pooled and concentrated using Microsep 10K Omega centrifugal devices (PALL). Protein concentration was determined by the bicichroninic acid (BCA) assay using bovine serum albumin (BSA) as a standard.

RNA Electrophoretic Mobility Shift Assays

Purified PUF domains were incubated at room temperature for 30 min with fluorescein labeled RNA oligonucleotides (Dharmacon) in 10 mM HEPES (pH 8.0), 1 mM EDTA, 50 mM KCl, 2 mM DTT, 0.1 mg/ml fatty acid-free BSA, and 0.02% Tween-20. The following RNA sequences were used:

NRE: 5′-(FI)AUUGUAUAUA-3′; (SEQ ID NO: 46) NREU3C: 5′-(FI)AUUGCAUAUA-3′. (SEQ ID NO: 47)

Reactions were analyzed by 10% PAGE in TAE and fluorescence was detected using a Typhoon TRIO scanner (GE).

Results

The eight repeats of the PUF domain of human PUM1 recognise only the adenine, guanine and uracil RNA bases of NRE RNA (FIG. 1). To address the need for a PUF domain capable of specifically recognising cytosine, a library of human PUM1 PUF domains was synthesized in which positions 12 and 16 of repeat six were randomized so as to encode all possible amino acids. The recombinant PUM1 PUF domain proteins were combined with an RNA target in which the corresponding base was altered to cytosine, and specific RNA-protein interactions detected using the yeast three hybrid system (Thomson, E. et al (2007) RNA 13: 2165-2174). Activation of the his3 reporter gene was expected to occur upon specific interaction between the recombinant PUM1 PUF domain proteins and the RNA targets. Five unique mutants were identified that exhibited protein-dependent reporter activation, the repeat 6 amino acid sequence for which is shown in FIG. 2. These protein variants interacted with RNAs containing a cytosine but not adenine, guanine or uracil as determined by growth on selective media and quantified using β-galactosidase assays to examine activation of a lacZ reporter gene (FIGS. 3 and 4). All five protein variants exhibited an asparagine at position 16 and an amino acid with a small or nucleophilic side chain at position 12 (glycine, alanine, serine, threonine, or cysteine).

Two of these protein variants (GR, with glycine at position 12 and arginine at position 16, and CR, with cysteine at position 12 and arginine at position 16) were recombinantly expressed in Escherichia coli and purified to homogeneity (FIG. 5). RNA electrophoretic mobility shift assays using these proteins showed a striking specificity shift between the selected PUF domains and a cytosine containing RNA and the wild type PUF and its cognate RNA (FIG. 6). The amino acid sequences of the GR, AR, SR, TR, and CR recombinant proteins of the invention are shown in FIG. 7.

EXAMPLE 2 Materials and Methods Plasmids

Plasmids expressing individual Puf domain mutants were prepared according to Example 1. RNA expression plasmids were also prepared according to Example 1, except that pairs of annealed oligonucleotides corresponding to the following RNA sequences were used (PUF recognition sequences in bold, site specific mutations underlined):

(SEQ ID NO: 42) NRE: 5′-CCGGCUAGCAAUUGUAUAUAUUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 48) NREU1C: 5′-CCGGCUAGCAAU C GUAUAUAUUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 49) NREG2C: 5′-CCGGCUAGCAAUU C UAUAUAUUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 44) NREU3C: 5′-CCGGCUAGCAAUUG C AUAUAUUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 50) NREA4C: 5′-CCGGCUAGCAAUUGU C UAUAUUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 51) NREU5C: 5′-CCGGCUAGCAAUUGUA C AUAUUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 52) NREA6C: 5′-CCGGCUAGCAAUUGUAU C UAUUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 53) NREU7C: 5′-CCGGCUAGCAAUUGUAUA C AUUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 54) NREA8C: 5′-CCGGCUAGCAAUUGUAUAU C UUAAUUUAAUAAAGCAUG-3′; (SEQ ID NO: 55) NREstem5: 5′-CCGGCUAGCAAUUGUAUAUAUUAAUAUAAUAAAGCAUG-3′; (SEQ ID NO: 56) NREstem6: 5′-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUUAAAGCAUG-3′; (SEQ ID NO: 57) NREstem7: 5′-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUAAAAGCAUG-3′; (SEQ ID NO: 58) NREstem8: 5′-CCGGCUAGCAAUUGUAUAUAUUAAUAUAUACAAGCAUG-3′.

Yeast three-hybrid growth assays and β-galactosidase assays were carried out according to Example 1.

Results

To determine the general applicability of the selected sequences to design PUF domains for binding to desired RNA sequences, a set of eight PUM1 mutants were prepared wherein repeats 1 to 8 of PUM1 were each altered to have a glycine at position 12 and an arginine at position 16 (GR). These engineered PUF domains were found to specifically bind to an RNA target with cytosine, but not adenine, guanine or uracil, at the position corresponding to the mutated repeat, as assessed by yeast three-hybrid growth assays and β-galactosidase assays (FIG. 8 a). All of these engineered PUF domains were found to specifically bind to their engineered RNA target with higher affinity than the wild-type, non-cytosine-containing RNA target.

Furthermore, the PUF domain of human PUM1 was found to bind RNA in which the sequence downstream of the NRE was modified sequentially to place the NRE in increasingly base-paired structures. This PUF domain of human PUM1 was found to be able to bind all the RNA targets (FIG. 8 b) including one in which every base was paired in a stem, albeit with less efficiency. This indicates the potential of the recombinant polypeptides of the invention to invade structured RNAs to bind to their target sequences, which is relevant to the rational engineering of these PUF domains.

EXAMPLE 3 Materials and Methods Plasmids

Plasmids expressing individual Puf domain mutants were prepared according to Example 1, except that to make a 16 repeat Puf protein (PUF×2), repeats 1-8 of the human PUM1 cDNA were amplified using primers that incorporated flanking Sacl sites, digested with Sacl and cloned into an engineered Sacl site that encodes amino acids 1030 and 1031 of the synthetic gene encoding the PUM1 PUF domain. RNA expression plasmids were also prepared according to Example 1, except that pairs of annealed oligonucleotides corresponding to the following RNA sequences were used:

NREx2 (SEQ ID NO: 59): 5′-CCGGCUAGCAAUUGUUGUAUAUAAUAUAUUAAUUUAAUAAAGCAU G-3′; NREx2mut1 (SEQ ID NO: 60): 5′-CCGGCUAGCAAUCCCUGUAUAUAAUAUAUUAAUUUAAUAAAGCAU G-3′; NREx2mut2 (SEQ ID NO: 61): 5′-CCGGCUAGCAAUUGUCCCCUAUAAUAUAUUAAUUUAAUAAAGCAU G-3′

Yeast three-hybrid growth assays and β-galactosidase assays were carried out according to Example 1.

Results

Naturally occurring PUF proteins typically contain eight RNA-binding repeats. Although this is sufficient for them to selectively regulate specific developmental processes, they often do so by binding multiple different RNAs (Gerber, A. P., Herschlag, D. & Brown, P. O. PLoS Biol 2, E79 (2004)). For many applications in biotechnology, synthetic biology and medicine it would be highly desirable to be able to target only one species of RNA within an entire transcriptome. To achieve such levels of sequence discrimination, we engineered PUFs with 16 RNA-binding repeats. We inserted sequences encoding only the RNA-binding PUF repeats, without flanking regions, from the human PUM1 cDNA between repeats five and six of a synthetic gene that encodes the same protein sequence as the PUM1 cDNA but is only 78% similar at the DNA level, to avoid potential instability of the recombinant DNA. Because the C. elegans FBF-1 and FBF-2 PUF proteins contain a short insertion close to the end of repeat five, we reasoned that this region might tolerate the insertion of extra PUF repeats. The extended PUF bound to its cognate extended RNA target in yeast and activated transcription of the 3-galactosidase reporter gene more efficiently than the eight repeat PUF with its cognate RNA (FIG. 9, upper panel). The inserted and flanking PUF repeats contributed to the binding affinity and selectivity as separately mutating the UGU triplets recognized by both sets of repeats significantly decreased β-galactosidase activity and growth on selective media (FIG. 9, lower panel). Engineered PUF domain proteins containing 16 RNA-binding repeats provide the means to selectively bind RNAs in higher eukaryotes that have more complex transcriptomes.

Surprisingly, the inventors of this application have successfully engineered a recombinant PUF domain capable of specifically binding cytosine, and not adenine, guanine or uracil, a function of which wild-type PUM1 protein is not capable.

The recombinant polypeptides and fusion proteins of the invention provide alternative agents potentially useful in the specific and targeted regulation of gene expression, and the methods of the invention provide alternative methods of potentially specifically regulating gene expression. The recombinant polypeptides, fusion proteins, and isolated nucleic acids of the invention are not subject to the same design constraints as are RNAi agents. Furthermore, the binding of the recombinant polypeptides and fusion proteins of the invention are sequence-specific and not limited the Watson and Crick base pairing, minimising non-specific binding to non-target RNA molecules and thus off-site effects. In contrast, each PUF repeat typically recognizes a single RNA base through three conserved side chains, two that make hydrogen bond or van der Waals interactions with the edge of an RNA base and a third side chain that stacks with the same base and/or the preceding base.

Further advantages of the use of the recombinant polypeptides and fusion proteins of the invention include the ease with which they may be introduced into cells without the need for using immunogenic transfection reagents or delivery vehicles, increased efficiency of introduction into cells, independence from processing by RNAi machinery, the scope by which the mechanism of gene expression regulation may be carried out by transcript degradation as well as translation inhibition. The fusion proteins of the invention allow for the convenient delivery into cells of a single entity capable of both specific targeting and binding of a target RNA molecule as well as its processing, inhibition, or transport. 

1. A recombinant polypeptide comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base.
 2. The polypeptide according to claim 1, wherein the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁ wherein X₁ is selected from the group including glutamine (Q), valine (V), methionine (M), proline (P), glutamic acid (E), and lysine (K); X₂ is selected from the group including histidine (H), phenylalanine (F), tyrosine (Y), and asparagine (N); X₃ is selected from the group including glycine (G) and alanine (A); X₄ is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X₅ is selected from the group including arginine (R), tyrosine (Y), histidine (H), and asparagine (N); X₆ is selected from the group including phenylalanine (F), leucine (L), and valine (V); X₇ is selected from the group including isoleucine (I), leucine (L), and valine (V); X_(g) is arginine (R); X₉ is selected from the group including leucine (L), lysine (K), arginine (R), glutamine (Q), and histidine (H); X₁₀ is selected from the group including lysine (K), phenylalanine (F), alanine (A), cysteine (C), isoleucine (I), valine (V), leucine (L), and methionine (M); and X₁₁ is selected from the group including leucine (L), phenylalanine (F), isoleucine (I), and valine (V); and wherein the RNA base-binding motif is operably capable of specifically binding to a cytosine RNA base.
 3. The polypeptide according to claim 2, wherein the PUF RNA-binding domain comprises at least one RNA base-binding motif of the general formula X1X2X3X4X5X6X7X8X9X10X11 wherein X1 is glutamine (Q); X2 is tyrosine (Y); X3 is glycine (G); X4 is selected from the group including glycine (G), alanine (A), serine (S), threonine (T) and cysteine (C); X5 is tyrosine (Y); X6 is valine (V); X7 is isoleucine (I); X8 is arginine (R); X9 is histidine (H); X10 is valine (V); and X11 is leucine (L). 4.-8. (canceled)
 9. The polypeptide according to claim 2, wherein the PUF RNA-binding domain comprises a plurality of RNA base-binding motifs, at least one of which is capable of specifically binding to a cytosine RNA base.
 10. The polypeptide according to claim 2, wherein the PUF RNA-binding domain comprises a plurality of consecutively ordered RNA base-binding motifs synergistically operable to bind a target RNA molecule with a target RNA sequence, each RNA base-binding motif capable of specifically binding to a cytosine, adenosine, guanine, or uracil RNA base, wherein the consecutive order of the RNA base-binding motifs corresponds with the consecutive order of the RNA bases in the target RNA sequence.
 11. The polypeptide according to claim 10, wherein the plurality of consecutively ordered RNA base-binding motifs comprises at least one first RNA base-binding motif having a sequence of any one of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5 and at least one second RNA base-binding motif having a sequence of any one of SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID NO: 12, and SEQ ID NO:
 13. 12. (canceled)
 13. The polypeptide according to claim 10, wherein the plurality of RNA base-binding motifs are operably linked via amino acid spacers.
 14. The polypeptide according to claim 13, wherein the amino acid spacers are derived from SEQ ID NO: 39, or part thereof.
 15. The polypeptide according to claim 2, wherein the recombinant polypeptide has a sequence of any one of SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18; or any one of SEQ ID NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, SEQ ID NO: 30 and SEQ ID NO:
 41. 16. A fusion protein comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base, and an effector domain.
 17. The fusion protein according to claim 16, wherein the effector domain is selected from the group comprising Endonucleases; proteins and protein domains capable of stimulating RNA cleavage; Exonucleases; Deadenylases; proteins and protein domains having nonsense mediated RNA decay activity; proteins and protein domains capable of stabilizing RNA; proteins and protein domains capable of repressing translation; proteins and protein domains capable of stimulating translation; proteins and protein domains capable of polyadenylation of RNA; proteins and protein domains capable of polyuridinylation of RNA; proteins and protein domains having RNA localization activity; proteins and protein domains capable of nuclear retention of RNA; proteins and protein domains having RNA nuclear export activity; proteins and protein domains capable of repression of RNA splicing; proteins and protein domains capable of stimulation of RNA splicing; proteins and protein domains capable of reducing the efficiency of transcription; and proteins and protein domains capable of stimulating transcription.
 18. (canceled)
 19. The fusion protein comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base, an effector domain and at least one polypeptide according to claim
 2. 20. An isolated nucleic acid encoding the recombinant polypeptide according to claim 2, or the fusion protein comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base, and an effector domain.
 21. The isolated nucleic acid according to claim 20, wherein the nucleic acid has a sequence selected from the group comprising any one of SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37 and SEQ ID NO: 40, and a sequence at least 80% homologous to any one of SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37 and SEQ ID NO:
 40. 22. A recombinant vector comprising nucleic acid encoding the recombinant polypeptide according to claim 2, or the fusion protein comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base, and an effector domain.
 23. A recombinant vector according to claim 22, wherein the nucleic acid has a sequence selected from the group comprising any one of SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37 and SEQ ID NO: 40, and a sequence at least 80% homologous to any one of SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 31, SEQ ID NO: 32, SEQ ID NO: 33, SEQ ID NO: 34, SEQ ID NO: 35, SEQ ID NO: 36, SEQ ID NO: 37 and SEQ ID NO:
 40. 24.-25. (canceled)
 26. A composition comprising the recombinant polypeptide according to claim 2 or the fusion protein comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base, and an effector domain.
 27. (canceled)
 28. Use of an effective amount of the recombinant polypeptide according to claim 2 or the fusion protein comprising at least one PUF RNA-binding domain capable of specifically binding to a cytosine RNA base, and an effector domain.
 29. (canceled)
 30. A system for regulating gene expression comprising (a) a modular set of isolated nucleic acids encoding a plurality of RNA base-binding motifs, the set including: at least one isolated nucleic acid encoding a RNA base-binding motif capable of specifically binding to a cytosine RNA base and at least one isolated nucleic acid encoding a RNA base-binding motif capable of specifically binding to an adenosine RNA base or a guanine RNA base or a uracil RNA base; (b) means for annealing the isolated nucleic acids of the modular set in a desired sequence to produce an isolated nucleic acid encoding an expressable recombinant polypeptide comprising a PUF RNA-binding domain having a plurality of consecutively ordered RNA base-binding motifs; and (c) a target RNA molecule with a target RNA sequence, wherein the consecutive order of the RNA base-binding motifs corresponds with the consecutive order of the RNA bases in the target RNA sequence.
 31. A kit for regulating gene expression comprising (a) a modular set of isolated nucleic acids encoding a plurality of RNA base-binding motifs, the set including: at least one isolated nucleic acid encoding a RNA base-binding motif capable of specifically binding to a cytosine RNA base and at least one isolated nucleic acid encoding a RNA base-binding motif capable of specifically binding to an adenosine RNA base or a guanine RNA base or a uracil RNA base; (b) means for annealing the isolated nucleic acids of the modular set in a desired sequence to produce an isolated nucleic acid encoding a recombinant polypeptide comprising a PUF RNA-binding domain having a plurality of consecutively ordered RNA base-binding motifs; and (c) optionally, a target RNA molecule with a target RNA sequence, wherein the consecutive order of the RNA base-binding motifs corresponds with the consecutive order of the RNA bases in the target RNA sequence. 